library(tidyverse)
library(psych)
library(cowplot)
library(patchwork)
library(here)
library(brms) 
library(tidybayes) 

divergent transitions

From McElreath:

Recall that HMC simulates the frictionless flow of a particle on a surface. In any given transition, which is just a single flick of the particle, the total energy at the start should be equal to the total energy at the end. That’s how energy in a closed system works. And in a purely mathematical system, the energy is always conserved correctly. It’s just a fact about the physics.

But in a numerical system, it might not be. Sometimes the total energy is not the same at the end as it was at the start. In these cases, the energy is divergent. How can this happen? It tends to happen when the posterior distribution is very steep in some region of parameter space. Steep changes in probability are hard for a discrete physics simulation to follow. When that happens, the algorithm notices by comparing the energy at the start to the energy at the end. When they don’t match, it indicates numerical problems exploring that part of the posterior distribution.


centered parameterization

In his lecture, McElreath uses CENTERED PARAMETERIZATION to demonstrate divergent transitions. A very simple example:

\[\begin{align*} x &\sim \text{Normal}(0, exp(\nu)) \\ \nu &\sim \text{Normal}(0, 3) \\ \end{align*}\]

This expression is centered because one set of priors (the priors for \(x\)) are centered around another prior (the prior for \(\nu\)). It’s intuitive, but this can cause a lot of problems with Stan, which is probably why McElreath used this for his example. In short, when there is limited data within our groups or the population variance is small, the parameters \(x\) and \(\nu\) become highly correlated. This geometry is challenging for MCMC to sample. (Think of a long and narrow groove, not a bowl, for your Hamiltonian skateboard.)


set.seed(1)
# plot the likelihoods
ps <- seq( from=-4, to=4, length.out=200) # possible parameter values for both x and nu

crossing(nu = ps, x=ps) %>%  #every possible combination of nu and x
  mutate(
    likelihood_nu = dnorm(nu, 0, 3),
    likelihood_x  = dnorm(x, 0, exp(nu)),
    joint_likelihood = likelihood_nu*likelihood_x
  ) %>% 
  ggplot( aes(x=x, y=nu, fill=joint_likelihood) ) +
  geom_raster() + 
  scale_fill_viridis_c() +
  guides(fill = F)


The way to fix this is by using an uncentered parameterization:

\[\begin{align*} x &= z\times \text{exp}(\nu) \\ z &\sim \text{Normal}(0, 1) \\ \nu &\sim \text{Normal}(0, 3) \\ \end{align*}\]

set.seed(1)
# plot the likelihoods
ps <- seq( from=-4, to=4, length.out=200) # possible parameter values for both x and nu

crossing(nu = ps, z=ps) %>%  #every possible combination of nu and x
  mutate(
    likelihood_nu = dnorm(nu, 0, 3),
    likelihood_z  = dnorm(z, 0, 1),
    joint_likelihood = likelihood_nu*likelihood_z
  ) %>% 
  ggplot( aes(x=z, y=nu, fill=joint_likelihood) ) +
  geom_raster() +
  scale_fill_viridis_c() +
  guides(fill = F)


It’s an important point, except the issues of centered parameterization are so prevalent1, that brms generally doesn’t allow centered parameterization (with some exceptions). So we can’t recreate the divergent transition situation that McElreath demonstrates in his lecture.

McElreath describes the problem of fertility in Bangladesh as such:

\[\begin{align*} C &\sim \text{Bernoulli}(p_i) \\ \text{logit}(p_i) &= \alpha_{D_{[i]}} \\ \alpha_j &\sim \text{Normal}(\bar{\alpha}, \sigma) \\ \bar{\alpha} &\sim \text{Normal}(0, 1) \\ \sigma &\sim \text{Exponential}(1) \\ \end{align*}\]

But to fit this using brms, we’ll rewrite as:

\[\begin{align*} C &\sim \text{Bernoulli}(p_i) \\ \text{logit}(p_i) &= \alpha + \alpha_{D[i]} \\ \alpha &\sim \text{Normal}(0, 1) \\ \alpha_{D[j]} &\sim \text{Normal}(0, \sigma_{D}) \\ \sigma_{D} &\sim \text{Exponential}(1) \end{align*}\]


\[\begin{align*} C &\sim \text{Bernoulli}(p_i) \\ \text{logit}(p_i) &= \alpha + \alpha_{D[i]} \\ \alpha &\sim \text{Normal}(0, 1) \\ \alpha_{D[j]} &\sim \text{Normal}(0, \sigma_{D}) \\ \sigma_{D} &\sim \text{Exponential}(1) \end{align*}\]

data(bangladesh, package="rethinking")
d <- bangladesh

m1 <- brm(
  data=d,
  family=bernoulli,
  use.contraception ~ 1 + (1 | district),
  prior = c( prior(normal(0, 1), class = Intercept), # alpha bar
             prior(exponential(1), class = sd)),       # sigma

  chains=4, cores=4, iter=2000, warmup=1000,
  seed = 1,
  file = here("files/data/generated_data/m71.1"))

m1
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: use.contraception ~ 1 + (1 | district) 
##    Data: d (Number of observations: 1934) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Multilevel Hyperparameters:
## ~district (Number of levels: 60) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.52      0.09     0.37     0.70 1.00     1374     1915
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.54      0.09    -0.72    -0.37 1.00     1998     2342
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

gather_draws(m1, b_Intercept, r_district[district, ]) %>% 
  with_groups(c(.variable, district), median_qi, .value)
## # A tibble: 61 × 8
## # Groups:   .variable, district [61]
##    .variable   district  .value .lower  .upper .width .point .interval
##    <chr>          <int>   <dbl>  <dbl>   <dbl>  <dbl> <chr>  <chr>    
##  1 b_Intercept       NA -0.536  -0.715 -0.369    0.95 median qi       
##  2 r_district         1 -0.454  -0.864 -0.0464   0.95 median qi       
##  3 r_district         2 -0.0482 -0.757  0.610    0.95 median qi       
##  4 r_district         3  0.301  -0.702  1.35     0.95 median qi       
##  5 r_district         4  0.343  -0.239  0.964    0.95 median qi       
##  6 r_district         5 -0.0297 -0.592  0.510    0.95 median qi       
##  7 r_district         6 -0.275  -0.773  0.197    0.95 median qi       
##  8 r_district         7 -0.216  -0.945  0.478    0.95 median qi       
##  9 r_district         8  0.0236 -0.567  0.603    0.95 median qi       
## 10 r_district         9 -0.162  -0.866  0.453    0.95 median qi       
## # ℹ 51 more rows

gather_draws(m1, b_Intercept, r_district[district, ]) %>% 
  with_groups(c(.variable, district), median_qi, .value) %>% 
  ggplot(aes( x=district, y=.value)) +
  geom_pointinterval( aes(ymin = .lower, ymax = .upper), 
                      alpha=.5) +
  labs(y="District distance from mean") +
  coord_flip()
## Warning: Removed 1 row containing missing values or values outside the scale range
## (`geom_segment()`).


\[\begin{align*} C &\sim \text{Bernoulli}(p_i) \\ \text{logit}(p_i) &= \alpha + \alpha_{D[i]} + \beta U_i + \beta_{D[i]}U_i \\ \alpha, \beta &\sim \text{Normal}(0, 1) \\ \alpha_{D[j]} &\sim \text{Normal}(0, \sigma_{D}) \\ \beta_{D[j]} &\sim \text{Normal}(0, \tau_{D}) \\ \sigma, \tau &\sim \text{Exponential}(1) \\ \end{align*}\]

m2 <- brm(
  data=d,
  family=bernoulli,
  use.contraception ~ 1 + urban + (1 + urban || district),
  prior = c( prior(normal(0, 1), class = Intercept), 
             prior(normal(0, 1), class = b),
             prior(exponential(1), class = sd)),     

  chains=4, cores=4, iter=2000, warmup=1000,
  seed = 1,
  file = here("files/data/generated_data/m71.2"))

Oops, no divergent transitions.

m2
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: use.contraception ~ 1 + urban + (1 + urban || district) 
##    Data: d (Number of observations: 1934) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Multilevel Hyperparameters:
## ~district (Number of levels: 60) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.48      0.09     0.32     0.67 1.01     1290     2067
## sd(urban)         0.55      0.21     0.11     0.96 1.00      860      912
## 
## Regression Coefficients:
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept    -0.70      0.09    -0.88    -0.53 1.00     2275     2893
## urban         0.63      0.15     0.33     0.92 1.00     2391     2077
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

more than one type of cluster

McElreath doesn’t cover this in his video lecture, but this is from the textbook and worth discussing.

data(chimpanzees, package="rethinking")
d <- chimpanzees
str(d)
## 'data.frame':    504 obs. of  8 variables:
##  $ actor       : int  1 1 1 1 1 1 1 1 1 1 ...
##  $ recipient   : int  NA NA NA NA NA NA NA NA NA NA ...
##  $ condition   : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ block       : int  1 1 1 1 1 1 2 2 2 2 ...
##  $ trial       : int  2 4 6 8 10 12 14 16 18 20 ...
##  $ prosoc_left : int  0 0 1 0 1 1 1 1 0 0 ...
##  $ chose_prosoc: int  1 0 0 1 1 1 0 0 1 1 ...
##  $ pulled_left : int  0 1 0 0 1 1 0 0 0 0 ...

From McElreath:

The data for this example come from an experiment aimed at evaluating the prosocial tendencies of chimpanzees (Pan troglodytes). The experimental structure mimics many common experiments conducted on human students (Homo sapiens studiensis) by economists and psychologists. A focal chimpanzee sits at one end of a long table with two levers, one on the left and one on the right in this figure. On the table are four dishes which may contain desirable food items. The two dishes on the right side of the table are attached by a mechanism to the right-hand lever. The two dishes on the left side are similarly attached to the left-hand lever.

When either the left or right lever is pulled by the focal animal, the two dishes on the same side slide towards opposite ends of the table. This delivers whatever is in those dishes to the opposite ends. In all experimental trials, both dishes on the focal animal’s side contain food items. But only one of the dishes on the other side of the table contains a food item. Therefore while both levers deliver food to the focal animal, only one of the levers delivers food to the other side of the table.

There are two experimental conditions. In the partner condition, another chimpanzee is seated at the opposite end of the table, as pictured in the figure. In the control condition, the other side of the table is empty. Finally, two counterbalancing treatments alternate which side, left or right, has a food item for the other side of the table. This helps detect any handedness preferences for individual focal animals.

When human students participate in an experiment like this, they nearly always choose the lever linked to two pieces of food, the prosocial option, but only when another student sits on the opposite side of the table. The motivating question is whether a focal chimpanzee behaves similarly, choosing the prosocial option more often when another animal is present. In terms of linear models, we want to estimate the interaction between condition (presence or absence of another animal) and option (which side is prosocial).


unique(d$actor)
## [1] 1 2 3 4 5 6 7
unique(d$block)
## [1] 1 2 3 4 5 6
unique(d$prosoc_left)
## [1] 0 1
unique(d$condition)
## [1] 0 1

We could model the interaction between condition (presence/absence of another animal) and option (which side is prosocial), but it is more difficult to assign sensible priors to interaction effects. Another option, because we’re working with categorical variables, is to turn our 2x2 into one variable with 4 levels.

d$treatment <- factor(1 + d$prosoc_left + 2*d$condition)
d %>% count(treatment, prosoc_left, condition)
##   treatment prosoc_left condition   n
## 1         1           0         0 126
## 2         2           1         0 126
## 3         3           0         1 126
## 4         4           1         1 126

In this experiment, each pull is within a cluster of pulls belonging to an individual chimpanzee. But each pull is also within an experimental block, which represents a collection of observations that happened on the same day. So each observed pull belongs to both an actor (1 to 7) and a block (1 to 6). There may be unique intercepts for each actor as well as for each block.

Mathematical model:

\[\begin{align*} L_i &\sim \text{Binomial}(1, p_i) \\ \text{logit}(p_i) &= \bar{\alpha} + \alpha_{\text{ACTOR[i]}} + \bar{\gamma} + \gamma_{\text{BLOCK[i]}} + \beta_{\text{TREATMENT[i]}} \\ \beta_j &\sim \text{Normal}(0, 0.5) \text{ , for }j=1..4\\ \alpha_j &\sim \text{Normal}(0, \sigma_{\alpha}) \text{ , for }j=1..7\\ \gamma_j &\sim \text{Normal}(0, \sigma_{\gamma}) \text{ , for }j=1..7\\ \bar{\alpha} &\sim \text{Normal}(0, 1.5) \\ \bar{\gamma} &\sim \text{Normal}(0, 1.5) \\ \sigma_{\alpha} &\sim \text{Exponential}(1) \\ \sigma_{\gamma} &\sim \text{Exponential}(1) \\ \end{align*}\]


m3 <- 
  brm(
    family = bernoulli,
    data = d, 
    bf(
      pulled_left ~ a + b, 
      a ~ 1 + (1 | actor) + (1 | block), 
      b ~ 0 + treatment, 
      nl = TRUE),
    prior = c(prior(normal(0, 0.5), nlpar = b),
              prior(normal(0, 1.5), class = b, coef = Intercept, nlpar = a),
              prior(exponential(1), class = sd, group = actor, nlpar = a),
              prior(exponential(1), class = sd, group = block, nlpar = a)),
  chains=4, cores=4, iter=2000, warmup=1000,
  seed = 1,
  file = here("files/data/generated_data/m71.3")
  )

m3
##  Family: bernoulli 
##   Links: mu = logit 
## Formula: pulled_left ~ a + b 
##          a ~ 1 + (1 | actor) + (1 | block)
##          b ~ 0 + treatment
##    Data: d (Number of observations: 504) 
##   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup draws = 4000
## 
## Multilevel Hyperparameters:
## ~actor (Number of levels: 7) 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(a_Intercept)     2.04      0.66     1.12     3.63 1.01     1424     2076
## 
## ~block (Number of levels: 6) 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(a_Intercept)     0.21      0.17     0.01     0.63 1.00     1587     1660
## 
## Regression Coefficients:
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## a_Intercept      0.58      0.71    -0.76     2.01 1.00     1136     1789
## b_treatment1    -0.13      0.30    -0.71     0.46 1.00     2102     2948
## b_treatment2     0.40      0.30    -0.20     0.99 1.00     1820     2576
## b_treatment3    -0.48      0.30    -1.06     0.11 1.00     1937     2509
## b_treatment4     0.28      0.30    -0.29     0.89 1.00     1910     2502
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

posterior_summary(m3)
##                              Estimate Est.Error          Q2.5        Q97.5
## b_a_Intercept            5.835058e-01 0.7051870 -7.574382e-01    2.0133561
## b_b_treatment1          -1.285035e-01 0.3000948 -7.111530e-01    0.4608178
## b_b_treatment2           3.968713e-01 0.2977210 -2.006941e-01    0.9864687
## b_b_treatment3          -4.770727e-01 0.3000809 -1.057934e+00    0.1067693
## b_b_treatment4           2.815642e-01 0.2984428 -2.944552e-01    0.8881553
## sd_actor__a_Intercept    2.036780e+00 0.6561266  1.116480e+00    3.6306757
## sd_block__a_Intercept    2.088196e-01 0.1743707  8.311932e-03    0.6295848
## r_actor__a[1,Intercept] -9.391636e-01 0.7061219 -2.360315e+00    0.3928665
## r_actor__a[2,Intercept]  4.108310e+00 1.3648897  2.040292e+00    7.2360335
## r_actor__a[3,Intercept] -1.245105e+00 0.7113142 -2.669128e+00    0.1290459
## r_actor__a[4,Intercept] -1.245720e+00 0.7152689 -2.676802e+00    0.1138210
## r_actor__a[5,Intercept] -9.385328e-01 0.7167749 -2.405589e+00    0.4292875
## r_actor__a[6,Intercept]  1.203465e-03 0.7156267 -1.441203e+00    1.3720687
## r_actor__a[7,Intercept]  1.526423e+00 0.7600257  4.334721e-02    3.0016306
## r_block__a[1,Intercept] -1.664840e-01 0.2169600 -7.073766e-01    0.1194874
## r_block__a[2,Intercept]  3.467096e-02 0.1698465 -2.979211e-01    0.4263189
## r_block__a[3,Intercept]  4.830452e-02 0.1783436 -2.839705e-01    0.4701920
## r_block__a[4,Intercept]  1.138782e-02 0.1732665 -3.519582e-01    0.3951538
## r_block__a[5,Intercept] -2.934102e-02 0.1689017 -4.049284e-01    0.3082666
## r_block__a[6,Intercept]  1.078886e-01 0.1934904 -1.977606e-01    0.5785745
## lprior                  -6.336549e+00 1.2258524 -9.231991e+00   -4.5017996
## lp__                    -2.866412e+02 3.7297778 -2.947833e+02 -280.1872116

m3 %>% 
  mcmc_plot(variable = c("^r_", "^b_", "^sd_"), regex = T) +
  theme(axis.text.y = element_text(hjust = 0))


as_draws_df(m3) %>% 
  select(starts_with("sd")) %>% 
  pivot_longer(everything()) %>% 
  ggplot(aes(x = value, fill = name)) +
  geom_density(linewidth = 0, alpha = 3/4, adjust = 2/3, show.legend = F) +
  annotate(geom = "text", x = 0.67, y = 2, label = "block", color = "#5e8485") +
  annotate(geom = "text", x = 2.725, y = 0.5, label = "actor", color = "#0f393a") +
  scale_fill_manual(values = c("#0f393a", "#5e8485")) +
  scale_y_continuous(NULL, breaks = NULL) +
  ggtitle(expression(sigma["group"])) +
  coord_cartesian(xlim = c(0, 4))
## Warning: Dropping 'draws_df' class as required metadata was removed.


exercise

Return to the data(Trolley) from an earlier lecture. Define and fit a varying intercepts model for these data, with responses clustered within participants. Include action, intention, and contact. Compare the varying-intercepts model and the model that ignores individuals using both some method of cross-validation.


solution

data(Trolley, package="rethinking")

# fit model without varying intercepts
m_simple <- brm(
  data = Trolley,
  family = cumulative, 
  response ~ 1 + action + intention + contact, 
  prior = c( prior(normal(0, 1.5), class = Intercept) ),
  iter=2000, warmup=1000, cores=4, chains=4,
  file=here("files/data/generated_data/m71.e1")
)

# fit model with varying intercepts
m_varying <- brm(
  data = Trolley,
  family = cumulative, 
  response ~ 1 + action + intention + contact + (1|id), 
  prior = c( prior(normal(0, 1.5), class = Intercept),
             prior(normal(0, 0.5), class = b),
             prior(exponential(1), class = sd)),
  iter=2000, warmup=1000, cores=4, chains=4,
  file=here("files/data/generated_data/m71.e2")
)

solution

# compare models using WAIC cross-validation
m_simple  <- add_criterion(m_simple , "loo")
m_varying <- add_criterion(m_varying, "loo")

loo_compare(m_simple, m_varying, criterion = "loo") %>% 
  print(simplify=F)
##           elpd_diff se_diff  elpd_loo se_elpd_loo p_loo    se_p_loo looic   
## m_varying      0.0       0.0 -15669.2     88.7       354.2      4.6  31338.4
## m_simple   -2876.0      86.2 -18545.1     38.1         9.2      0.0  37090.3
##           se_looic
## m_varying    177.5
## m_simple      76.2

# posterior predictive check
pp_check(m_simple, ndraws = 5, type="hist") +
  ggtitle("Simple Model")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

pp_check(m_varying, ndraws = 5, type="hist") +
  ggtitle("Varying Intercepts Model")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.


predictions

Posterior predictions in multilevel models are a bit more complicated than single-level, because the question arises: predictions for the same clusters or predictions for new clusters?

In other words, do you want to know more about the chimps you collected data on, or new chimps? Let’s talk about both.

predictions for chimps in our sample

Recall that the function fitted() give predictions.

labels <- c("R/N", "L/N", "R/P", "L/P")

nd <- distinct(d, treatment, actor) %>% 
  mutate(block=1)

f <- fitted(m3,newdata = nd) %>% 
  data.frame() %>% 
  bind_cols(nd) %>% 
  mutate(treatment = factor(treatment, labels = labels))

f %>% 
  ggplot( aes(x=treatment, y=Estimate, group=1) ) +
  geom_ribbon(aes( ymin=Q2.5, ymax=Q97.5 ), 
              fill = "#0f393a",
              alpha=.3) +
  geom_line(color="#0f393a") +
  facet_wrap(~actor)


# observed p
obs = d %>% 
  filter(block==1) %>% 
  group_by(actor, treatment) %>% 
  summarise(p = mean(pulled_left), .groups = "drop") %>% 
  mutate(treatment = factor(treatment, labels = labels))


f %>% 
  ggplot( aes(x=treatment, y=Estimate, group=1) ) +
  geom_ribbon(aes( ymin=Q2.5, ymax=Q97.5 ), 
              fill = "#0f393a",
              alpha=.3) +
  geom_point( aes(y=p), 
              data=obs, 
              shape=1) +
  geom_line(color="#0f393a") +
  facet_wrap(~actor)

We can add in the observed probabilities.


predictions for new chimps

Even here, we have some choice. Let’s start by predicting scores for the average chimp.

post <- as_draws_df(m3)
avg_chimp = post %>% select(starts_with("b_")) %>% 
  pivot_longer(-b_a_Intercept) %>% 
  mutate(
    fitted = b_a_Intercept + value,
    fitted = inv_logit_scaled(fitted),
    treatment=factor(str_remove(name, "b_b_treatment"),
                     labels=labels)
  ) %>% 
  group_by(treatment) %>% 
  median_qi(fitted)
## Warning: Dropping 'draws_df' class as required metadata was removed.
avg_chimp
## # A tibble: 4 × 7
##   treatment fitted .lower .upper .width .point .interval
##   <fct>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 R/N        0.613  0.295  0.866   0.95 median qi       
## 2 L/N        0.725  0.409  0.918   0.95 median qi       
## 3 R/P        0.526  0.228  0.820   0.95 median qi       
## 4 L/P        0.701  0.381  0.908   0.95 median qi

f %>% 
  ggplot( aes(x=treatment, y=Estimate, group=1) ) +
  geom_ribbon(aes( ymin=Q2.5, ymax=Q97.5 ), 
              fill = "#0f393a",
              alpha=.3) +
  geom_point( aes(y=p), 
              data=obs, 
              shape=1) +
  geom_line(color="#0f393a") +
  geom_ribbon(aes( x=treatment, ymin=.lower, ymax=.upper ), 
              alpha=.3,
              fill="#e07a5f",
              data=avg_chimp,
              inherit.aes = F) +
  geom_line( aes(y=fitted), data=avg_chimp, color ="#e07a5f") +
  geom_line(color="#0f393a") +
  facet_wrap(~actor)


But the average chimp is only one possible chimp we could encounter. Let’s simulate 100 possible chimps.

post %>% 
  slice_sample(n=100) %>% 
  # simulate chimps
  mutate(a_sim = rnorm(n(), mean = b_a_Intercept, sd = sd_actor__a_Intercept)) %>% 
  pivot_longer(b_b_treatment1:b_b_treatment4) %>% 
  mutate(fitted = inv_logit_scaled(a_sim + value)) %>% 
  mutate(treatment = factor(str_remove(name, "b_b_treatment"),
                            labels = labels)) %>%
  ggplot(aes(x = treatment, y = fitted, group = .draw)) +
  geom_line(alpha = 1/2, color = "#e07a5f") +
  coord_cartesian(ylim = 0:1) 
## Warning: Dropping 'draws_df' class as required metadata was removed.


exercise

Returning to the Trolley data and the varying intercept model, get predictions for…

  1. a subset of 3 participants in the dataset.
  2. the average participant.
  3. 2 new participants.

Hint: don’t forget that the model uses a link function. You may need to play with arguments or fiddle around with the outputs of your functions.


solution

3 participants

part3 = sample( unique(Trolley$id) , size=3, replace=F )
nd <- distinct(Trolley, action, intention, contact, id) %>% 
  filter(id %in% part3)

f <- fitted(m_varying, newdata = nd, scale = "response") %>% 
  data.frame() %>% 
  bind_cols(nd) 

f %>% 
  pivot_longer(-c(action:id),
               names_sep = "\\.{3}",
               names_to = c("stat", "response")) %>% 
  pivot_wider(names_from = stat, values_from = value) %>% 
  mutate(response = str_sub(response, 1, 1)) %>% 
  ggplot(aes(x=response, y=Estimate.P.Y, fill=as.factor(intention))) +
  geom_bar(stat="identity", position="dodge") +
  labs(y="p") +
  facet_grid(action+contact~id) +
  theme(legend.position = "bottom")


solution

The average participant

nd <- distinct(Trolley, action, intention, contact) 

f <- fitted(m_varying, newdata = nd, scale = "response", 
            re_formula = NA) %>% 
  data.frame() %>% 
  bind_cols(nd) 

f %>% 
  pivot_longer(-c(action:contact),
               names_sep = "\\.{3}",
               names_to = c("stat", "response")) %>% 
  pivot_wider(names_from = stat, values_from = value) %>% 
  mutate(response = str_sub(response, 1, 1)) %>% 
  ggplot(aes(x=response, y=Estimate.P.Y, fill=as.factor(intention))) +
  geom_bar(stat="identity", position="dodge") +
  labs(y="p") +
  facet_grid(action~contact) +
  theme(legend.position = "bottom")


Two new participants

# create data for 2 new participants
nd <- distinct(Trolley, action, intention, contact) %>%
  slice(rep(1:n(), times = 2)) %>%
  mutate(id = rep(c("New1", "New2"), each = n()/2))

# get predictions including random effects
f <- fitted(m_varying, newdata = nd, 
            scale = "response", allow_new_levels=T) %>% 
  data.frame() %>% 
  bind_cols(nd) 

# plot
f %>% 
  pivot_longer(-c(action:id),
               names_sep = "\\.{3}",
               names_to = c("stat", "response")) %>% 
  pivot_wider(names_from = stat, values_from = value) %>% 
  mutate(response = str_sub(response, 1, 1)) %>% 
  ggplot(aes(x=response, y=Estimate.P.Y, fill=as.factor(intention))) +
  geom_bar(stat="identity", position="dodge") +
  labs(y="p", fill="intention") +
  facet_grid(action+contact~id) +
  theme(legend.position = "bottom")


  1. this video is a great explanation↩︎